Method for isolating a polynucleotide of interest from the genome of a mycobacterium using a BAC-based DNA library, application to the detection of mycobacteria

ABSTRACT

A method for isolating a polynucleotide of interest that is present in the genome of a first mycobacterium strain and/or is expressed by the first mycobacterium strain, where the polynucleotide of interest is also absent or altered in the genome of a second mycobacterium strain and/or is not expressed in the second mycobacterium. The method includes (a) contacting the genomic DNA of the first mycobacterium strain under hybridizing conditions with the DNA of a least one clone that belongs to a bacterial artificial chromosome (BAC) genomic DNA library of the second mycobacterium strain, and (b) isolating the polynucleotide of interest that does not form a hybrid with the DNA of the second mycobacterium strain. This invention further pertains to a  Mycobacterium tuberculosis  strain H37Rv genomic DNA library, as well as a  Mycobacterium bovis  BCG strain Pasteur genomic DNA library, and the recombinant BAC vectors that belong to those genomic DNA libraries. This invention also relates to mycobacterial nucleic acids, and methods and kits for using these nucleic acids to detect mycobacteria in a biological sample.

This application is a 371 of PCT/IB99/00740, filed Apr. 16, 1999, whichis a continuation of U.S. patent application Ser. No. 09/060,756, nowU.S. Pat. No. 6,183,957, filed Apr. 16, 1998.

I. BACKGROUND OF THE INVENTION

The present invention pertains to a method for isolating apolynucleotide of interest that is present in the genome of amycobacterium strain and/or is expressed by said mycobacterium strainand that is absent or altered in the genome of a different mycobacteriumstrain and/or is not expressed in said different mycobacterium strain,said method comprising the use of at least one clone belonging to agenomic DNA library of a given mycobacterium strain, said DNA librarybeing cloned in a bacterial artificial chromosome (BAC). The inventionconcerns also polynucleotides identified by the above method, as well asdetection methods for mycobacteria, particularly Mycobacteriumtuberculosis, and kits using said polynucleotides as primers or probes.Finally, the invention deals with BAC-based mycobacterium DNA librariesused in the method according to the invention and particularly BAC-basedMycobacterium tuberculosis and Mycobacterium bovis BCG DNA libraries.

Radical measures are required to prevent the grim predictions of theWorld Health Organisation for the evolution of the global tuberculosisepidemic in the next century becoming a tragic reality. The powerfulcombination of genomics and bioinformatics is providing a wealth ofinformation about the etiologic agent, Mycobacterium tuberculosis, thatwill facilitate the conception and development of new therapies. Thestart point for genome sequencing was the integrated map of the 4.4 Mbcircular chromosome of the widely-used, virulent reference strain, Mtuberculosis H37Rv and appropriate cosmids were subjected to systematicshotgun sequence analysis at the Sanger Centre.

Cosmid clones (Balasubramanian et al., 1996; Pavelka et al., 1996) haveplayed a crucial role in the M. tuberculosis H37Rv genome sequencingproject. However, problems such as under-representation of certainregions of the chromosome, unstable inserts and the relatively smallinsert size complicated the production of a comprehensive set ofcanonical cosmids representing the entire genome.

II. SUMMARY OF THE INVENTION

In order to avoid the numerous technical constraints encountered in thestate of the art, as decribed hereabove, when using genomicmycobacterial DNA libraries constructed in cosmid clones, the inventorshave attempted to realize genomic mycobacterial DNA libraries in analternative type of vectors, namely Bacterial Artificial Chromosome(BAC) vectors.

The success of this approach depended on whether the resulting BACclones could maintain large mycobacterial DNA inserts. There are variousreports describing the successful construction of a BAC library foreucaryotic organisms (Cai et al., 1995; Kim et al., 1996; Misumi et al.,1997; Woo et al., 1994; Zimmer et al., 1997) where inserts up to 725 kb(Zimmer et al., 1997) were cloned and stably maintained in the E. colihost strain.

Here, it is shown that, surprisingly, the BAC system can also be usedfor mycobacterial DNA, as 70% of the clones contained inserts in thesize of 25 to 104 kb.

This is the first time that bacterial, and specifically mycobacterial,DNA is cloned in such BAC vectors.

In an attempt to obtain complete coverage of the genome with a minimaloverlapping set of clones, a Bacterial Artificial Chromosome (BAC)library of M tuberculosis was constructed, using the vector pBeloBAC11(Kim et al., 1996) which combines a simple phenotypic screen forrecombinant clones with the stable propagation of large inserts (Shizuyaet al., 1992). The BAC cloning system is based on the E. coli F-factor,whose replication is strictly controlled and thus ensures stablemaintenance of large constructs (Willets et al., 1987). BACs have beenwidely used for cloning of DNA from various eucaryotic species (Cai etal., 1995; Kim et al., 1996; Misumi et al., 1997; Woo et al., 1994;Zimmer et al., 1997). In contrast, to our knowledge this reportdescribes the first attempt to use the BAC system for cloning bacterialDNA.

A central advantage of the BAC cloning system over cosmid vectors usedin prior art is that the F-plasmid is present in only one or a maximumof two copies per cell, reducing the potential for recombination betweenDNA fragments and, more importantly, avoiding the lethal overexpressionof cloned bacterial genes. However, the presence of the BAC as just asingle copy means that plasmid DNA has to be extracted from a largevolume of culture to obtain sufficient DNA for sequencing and it isdescribed here in the examples a simplified protocol to achieve this.

Further, the stability and fidelity of maintenance of the clones in theBAC library represent ideal characteristics for the identification ofgenomic differences possibly responsible for phenotypic variations indifferent mycobacterial species.

As it will be shown herein, BACs can be allied with conventionalhybridization techniques for refined analyses of genomes andtranscriptional activity from different mycobacterial species.

Having established a reliable procedure to screen for genomicpolymorphisms, it is now possible to conduct these comparisons on a moresystematic basis than in prior art using representative BACs throughoutthe chromosome and genomic DNA from a variety of mycobacterial species.

As another approach to display genomic polymorphisms, the inventors havealso started to use selected H37Rv BACs for “molecular combing”experiments in combination with fluorescent in situ hybridization(Bensimon et al., 1994; Michalet et al., 1997). With such techniques theone skilled in the art is enabled to explore the genome of mycobacteriain general and of M tuberculosis in particular for further polymorphicregions.

The availability of BAC-based genomic mycobacterial DNA librariesconstructed by the inventors have allowed them to design methods andmeans both useful to identify genomic regions of interest of pathogenicmycobacteria, such as Mycobacterium tuberculosis, that have nocounterpart in the corresponding non-pathogenic strains, such asMycobacterium bovis BCG, and useful to detect the presence ofpolynucleotides belonging to a specific mycobacterium strain in abiological sample.

By a biological sample according to the present invention, it is notablyintended a biological fluid, such as plasma, blood, urine or saliva, ora tissue, such as a biopsy.

Thus, a first object of the invention consists of a method for isolatinga polynucleotide of interest that is present in the genome of amycobacterium strain and/or is expressed by said mycobacterium strainand that is absent or altered in the genome of a different mycobacteriumstrain and/or is not expressed in said different mycobacterium strain,said method comprising the use of at least one clone belonging to agenomic DNA library of a given mycobaterium strain, said DNA librarybeing cloned in a bacterial artificial chromosome (BAC).

The invention is also directed to a polynucleotide of interest that hasbeen isolated according to the above method and in particular apolynucleotide containing one or several Open Reading Frames (ORFs), forexample ORFs encoding either a polypeptide involved in the pathogenicityof a mycobacterium strain or ORFs encoding Polymorphic Glycine RichSequences (PGRS).

Such polynucleotides of interest may serve as probes or primers in orderto detect the presence of a specific myobacterium strain in a biologicalsample or to detect the expression of specific genes in a particularmycobacterial strain of interest.

The BAC-based genomic mycobacterial DNA libraries generated by thepresent inventors are also part of the invention, as well as each of therecombinant BAC clones and the DNA insert contained in each of saidrecombinant BAC clones.

The invention also pertains to methods and kits for detecting a specificmycobacterium in a biological sample using either at least onerecombinant BAC clone or at least one polynucleotide according to theinvention, as well as to methods and kits to detect the expression ofone or several specific genes of a given mycobacterial strain present ina biological sample.

III. BRIEF DESCRIPTION OF THE FIGURES

In order to better understand the present invention, reference will bemade to the appended figures which depicted specific embodiments towhich the present invention is in no case limited in scope with.

FIGS. 1A and 1B: PCR-screening for unique BAC clones with specificprimers for 2 selected genomic regions of the H37Rv chromosome, using 21pools representating 2016 BACs (FIG. 1A) and sets of 20 subpools fromselected positive pools (FIG. 1B).

FIG. 2: Pulsed-field gel electrophoresis gel of DraI-cleaved BAC clonesused for estimating the insert sizes of BACs.

FIG. 3: Minimal overlapping BAC map of M. tuberculosis H37Rvsuperimposed on the integrated physical and genetic map established byPhilipp et al. (18). Y- and I- numbers show pYUB328 (2) and pYUB412 (16)cosmids which were shotgun sequenced during the H37Rv genome sequencingproject. Y-cosmids marked with * were shown in the integrated physicaland genetic map (18). Rv numbers show the position of representative BACclones relative to sequenced Y- and I- clones. Squared Rv numbers showBACs which were shotgun sequenced at the Sanger Centre.

FIGS. 4A and 4B: Ethidium bromide stained gel (FIG. 4A) andcorresponding Southern blot (FIG. 4B) of EcoRI and PvuII digested Rv58DNA hybridized with ³²P labeled genomic DNA preparations from M.tuberculosis H37Rv, M bovis ATCC 19210 and M bovis BCG Pasteur.

FIG. 5: Organisation of the ORFs in the 12.7 kb genomic region presentin M. tuberculosis H37Rv but not present in M bovis ATCC 19210 and M.bovis BCG Pasteur. Arrows show the direction of transcription of theputative genes. Positions of EcoRI and PvuII restriction sites areshown. Vertical dashes represent stop codons. The 11 ORFs correspond tothe ORFs MTCY277.28 to MTCY277.38/accession number Z79701-EMBLNucleotide Sequence Data Library. The junction sequences flanking thepolymorphic region are shown.

FIG. 6: Variation in the C-terminal part of a PE-PGRS open reading framein M. tuberculosis strain H37Rv relative to M. bovis BCG strain Pasteur.

The numbers on the right side of the Figure denote the position of theend nucleotides, taking as the reference the M tuberculosis genome.

FIG. 7: Polynucleotide sequence next to the HindIII cloning site in theBAC vector pBeloBAC11 (Kim et al., 1996) used to clone the inserts ofthe BAC-based myobacterial genomic DNA library according to theinvention.

NotI: location of the NotI restriction sites.

Primer T7-BAC 1: nucleotide region recognized by the T7-BAC1 primershown in Table 1.

T7 promoter: location of the T7 promoter region on the pBeloBac11vector.

Primer T7-Belo2: nucleotide region recognized by the T7-Belo2 pimershown in Table 1.

Hind III: the HindIII cloning site used to clone the genomic inserts inthe pBeloBAC11 vector.

SP6-Mid primer: nucleotide region recognized by the SP6 Mid primer shownin Table 1.

SP6-BAC1 primer: nucleotide region recognized by the SP6 BAC1 primershown in Table 1.

SP6 promoter: location of the SP6 promoter region on the pBeloBac11vector.

IV. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

As already mentioned hereinbefore, the present invention is directed toa method for isolating a polynucleotide of interest that is present inthe genome of a mycobacterium strain and/or is expressed by saidmycobacterium strain and that is absent or altered in the genome of adifferent mycobacterium strain and/or is not expressed in said differentmycobacterium strain, said method comprising the use of at least oneclone belonging to a genomic DNA library of a given mycobaterium strain,said DNA library being cloned in a bacterial artificial chromosome (BAC)type vector.

For this purpose, the inventors have constructed several BAC-basedmycobacterial genomic DNA libraries that may be used in order to performthe above described method.

Because it is the first time that mycobacterial genomic, DNA has beensuccessfully cloned in BAC type vectors, and because these DNA librariesare then novel and nonobvious, an object of the present inventionconsists in a myobacterial genomic DNA library cloned in such a BAC typevector.

As an illustrative example, a BAC-based DNA library of Mycobacteriumtuberculosis has been realized. Forty-seven cosmids chosen from theintegrated map of the 4.4 Mb circular chromosome (Philipp et al., 1996a)were shotgun-sequenced during the initial phase of the H37Rv genomesequence project. The sequences of these clones were used as landmarksin the construction of a minimally overlapping BAC map. Comparison ofthe sequence data from the termini of 420 BAC clones allowed us toestablish a minimal overlapping BAC map and to fill in the existing gapsbetween the sequence of cosmids. As well as using the BAC library forgenomic mapping and sequencing, we also tested the system in comparativegenomic experiments in order to uncover differences between two closelyrelated mycobacterial species. As shown in a previous study (Philipp etal., 1996b), M tuberculosis, M bovis and M bovis BCG, specifically BCGPasteur strain, exhibit a high level of global genomic conservation, butcertain polymorphic regions were also detected. Therefore, it was ofgreat interest to find a reliable, easy and rapid way to exactlylocalize polymorphic regions in mycobacterial genomes using selected BACclones. This approach was validated by determining the exact size andlocation of the polymorphisms in the genomic region of DraI fragment Z4(Philipp et al., 1996b), taking advantage of the availability of anappropriate BAC clone covering the polymorphic region and the H37Rvgenome sequence data. This region is located approximately 1.7 Mb fromthe origin of replication.

The Bacterial Artificial Chromosome (BAC) cloning system is capable ofstably propagating large, complex DNA inserts in Escherichia coli. Aspart of the Mycobacterium tuberculosis H37Rv genome sequencing project,a BAC library was constructed in the pBeloBAC11 vector and used forgenome mapping, confirmation of sequence assembly, and sequencing. Thelibrary contains about 5000 BAC clones, with inserts ranging in sizefrom 25 to 104 kb, representing theoretically a 70 fold coverage of theM tuberculosis genome (4.4 Mb). A total of 840 sequences from the T7 andSP6 termini of 420 BACs were determined and compared to those of apartial genomic database. These sequences showed excellent correlationbetween the estimated sizes and positions of the BAC clones and thesizes and positions of previously sequenced cosmids and the resultingcontigs. Many BAC clones represent linking clones between sequencedcosmids, allowing full coverage of the H37Rv chromosome, and they arenow being shotgun-sequenced in the framework of the H37Rv sequencingproject. Also, no chimeric, deleted or rearranged BAC clones weredetected, which was of major importance for the correct mapping andassembly of the H37Rv sequence. The minimal overlapping set contains 68unique BAC clones and spans the whole H37Rv chromosome with theexception of a single gap of ˜150 kb. As a post-genomic application, thecanonical BAC set was used in a comparative study to reveal chromosomalpolymorphisms between M tuberculosis, M bovis and M bovis BCG Pasteur,and a novel 12.7 kb segment present M tuberculosis but absent from Mbovis and M bovis BCG was characterized. This region contains a set ofgenes whose products show low similarity to proteins involved inpolysaccharide biosynthesis. The H37Rv BAC library therefore providesthe one skilled in the art with a powerful tool both for the generationand confirmation of sequence data as well as for comparative genomicsand a plurality of post-genomic applications.

The above described BAC-based Mycobacterium tuberculosis genomic DNAlibrary is part of the present invention and has been deposited in theCollection Nationale de Cultures de Microorganismes (CNCM) on Nov. 19,1997 under the accession number 1-1945.

Another BAC-based DNA library has been constructed with the genomic DNAof Mycobacterium bovis BCG, Pasteur strain, and said DNA library hasbeen deposited in the Collection Nationale de Cultures deMicroorganismes (CNCM) on Jun. 30, 1998 under the accession numberI-2049.

Thus, as a specific embodiment of the above described method forisolating a polynucleotide of interest said method makes use of at leastone BAC-based DNA library that has beeen constructed from the genomicDNA of Mycobacterium tuberculosis, more specifically of the H37Rv strainand particularly of the DNA library deposited in the accession number1-1945.

In another specific embodiment of the above described method forisolating a polynucleotide of interest said method makes use of at leastone BAC-based DNA library has beeen constructed from the genomic DNA ofMycobacterium bovis BCG, more specifically of the Pasteur strain andparticularly of the DNA library deposited in the accession numberI-2049.

In more details, the method according to the invention for isolating apolynucleotide of interest may comprise the following steps:

a) isolating at least one polynucleotide contained in a clone of aBAC-based DNA library of mycobacterial origin;

b) isolating:

at least one genomic or cDNA polynucleotide from a mycobacterium, saidmycobacterium belonging to a strain different from the strain used toconstruct the BAC-based DNA library of step a); or alternatively

at least one polynucleotide contained in a clone of a BAC-based DNAlibrary prepared from the genome of a mycobacterium that is differentfrom the mycobacterium used to construct the BAC-based DNA library ofstep a);

c) hybridizing the at least one polynucleotide of step a) to the atleast one polynucleotide of step b);

d) selecting the at least one polynucleotide of step a) that has notformed a hybrid complex with the at least one polynucleotide of step b);

e) characterizing the selected polynucleotide.

Following the above procedure, the at least one polynucleotide of stepa) may be prepared as follows:

1) digesting at least one recombinant BAC clone by an appropriateresctriction endonuclease in order to isolate the polynucleotide insertof interest from the vector genetic material;

2) optionally amplifying the resulting polynucleotide insert;

3) optionally digesting the polynucleotide insert of step 1) or step 2)with at least one restriction endonuclease.

The above method of the invention allows the one skilled in the art toperform comparative genomics between different strains or species ofmycobacteria cells, for example between pathogenic strains or speciesand their non pathogenic strains or species counterparts, as it is theillustrative case for the genomic comparison between Mycobacteriumtuberculosis and Mycobacterium bovis BCG that is described herein in theexamples.

Restriction digests of a given clone of a BAC library according to theinvention may be blotted to membranes, and then probed with radiolabeledDNA form another strain or another species of mycobacteria, allowing theone skilled in the art to identify, characterize and isolate apolynucleotide of interest that may be involved in important metabolicaland/or physiological pathways of the mycobacterium under testing, suchas a polynucleotide functionally involved in the pathogenicity of saidgiven mycobacteria for its host organism.

More specifically, the inventors have shown in Example 6 that whenrestriction digests of a given clone of the BAC library identified bythe CNCM accession number 1-1945 are blotted to membranes and thenprobed with radiolabeled total genomic DNA from, for example,Mycobacterium bovis BCG Pasteur, it is observed that restrictionfragments that fail to hybridize with the M bovis BCG Pasteur DNA areabsent from its genome, hence identifying polymorphic regions between Mbovis BCG Pasteur and M tuberculosis H37Rv.

Thus, a further object of the present invention consists in apolynucleotide of interest that has been isolated according to themethod described herein before.

In Example 6, a polynucleotide of approximately 12.7 kilobases has beenisolated that is present in the genome of M tuberculosis but is absentof the genome of M bovis BCG. This polynucleotide of interest contains11 ORFs that may be involved in polysaccharide biosynthesis. Inparticular, two of said ORFs are of particular interest namely ORF6(MTCY277.33; Rv1511) that encodes a protein that shares significanthomology with bacterial GDP-D-mannose dehydratases, whereas the proteinencoded by ORF7 (MTCY277.34; Rv1512) shares significant homology with anucleotide sugar epimerase. As polysaccharide is a major constituent ofthe mycobacterial cell wall, these deleted genes may cause the cell wallof M bovis BCG to differ from that of M tuberculosis, a fact that mayhave important consequences for both the immune response to M bovis BCGand virulence. Detection of such a polysaccharide is of diagnosticinterest and possibly useful in the design of tuberculosis vaccines.

Consequently, the polynucleotide of interest obtained following themethod according to the invention may contain at least one ORF, said ORFpreferably encoding all or part of a polypeptide involved in animportant metabolical and/or physiological pathway of the mycobacteriaunder testing, and more specifically all or part of a polypeptide thatis involved in the pathogenicity of the mycobacteria under testing, suchas for example Mycobacterium tuberculosis, and more generallymycobacteria belonging to the Mycobacterium tuberculosis complex.

The Mycobacterium tuberculosis complex has its usual meaning, i.e. thecomplex of mycobacteria causing tuberculosis which are Mycobacteriumtuberculosis, Mycobacterium bovis, Mycobacterium africanum,Mycobacterium microti and the vaccine strain Mycobacterium bovis BCG.

An illustrative polynucleotide of interest according to the presentinvention comprises all or part of the polynucleotide of approximately12.7 kilobases that is present in the genome of M. tuberculosis but isabsent from the genome of M. bovis BCG disclosed hereinbefore. Thispolynucleotide is contained in clone Rv58 of the BAC DNA library I-1945.

Generally, the invention also pertains to a purified polynucleotidecomprising the DNA insert contained in a recombinant BAC vectorbelonging to a BAC-based mycobacterial genomic DNA library, such as forexample the I-1945 BAC DNA library.

Advantageously, such a polynucleotide has been identified according tothe method of the invention.

Such a polynucleotide of interest may be used as a probe or a primeruseful for specifically detecting a given mycobacterium of interest,such as Mycobacterium tuberculosis or Mycobacterium bovis BCG.

More specifically, the invention then deals with a purifiedpolynucleotide useful as probe or a primer comprising all or part of thenucleotide sequence SEQ ID N^(o) 1.

The location, on the Mycobacterium tuberculosis chromosome, of the abovepolynucleotide of sequence SEQ ID N^(o) 1 has now been ascribed tobegin, at its 5′ end at nucleotide at position nt 1696015 and to end, atits 3′ end, at nucleotide at position nt 1708746.

For diagnostic purposes, this 12.7 kb deletion should allow a rapid PCRscreening of tubercle isolates to identify whether they are bovine orhuman strains. The primers listed in Table I are flanking the deletedregion and give a 722 bp amplicon in M bovis or M. bovis BCG strains,but a fragment of 13,453 bp in M tuberculosis that is practicallyimpossible to amplify under the same PCR conditions. More importantly,assuming that some of the gene products from this region representproteins with antigenic properties, it could be possible to develop atest that can reliably distinguish between the immune response inducedby vaccination with M. bovis BCG vaccine strains and infection with Mtuberculosis or that the products (e.g. polysaccharides) are specificimmunogens.

The invention also provides for a purified polynucleotide useful as aprobe or as a primer, said polynucleotide being chosen in the followinggroup of polynucleotides:

a) a polynucleotide comprising at least 8 consecutive nucleotides of thesequence SEQ ID N^(o) 1;

b) a polynucleotide whose sequence is fully complementary to thesequence of the polynucleotide defined in a);

c) a polynucleotide that hybridizes under stringent hybridizationconditions with the polynucleotide defined in a) or with thepolynucleotide defined in b).

For the purpose of defining a polynucleotide or oligonucleotidehybridizing under stringent hybridization conditions, such as above, itis intended a polynucleotide that hybridizes with a referencepolynucleotide under the following hybridization conditions.

The hybridization step is realized at 65° C. in the presence of 6×SSCbuffer, 5×Denhardt's solution, 0,5% SDS and 100 μg/ml of salmon spermDNA.

For technical information, 1×SSC corresponds to 0.15 M NaCl and 0.05Msodium citrate; 1×Denhardt's solution corresponds to 0.02% Ficoll, 0.02%polyvinylpyrrolidone and 0.02% bovine serum albumin.

The hybridization step is followed by four washing steps:

two washings during 5 min, preferably at 65° C. in a 2×SSC and 0.1% SDSbuffer,

one washing during 30 min, preferably at 65° C. in a 2×SSC and 0.1% SDSbuffer,

one washing during 10 min, preferably at 65° C. in a 0.1×SSC and 0.1%SDS buffer.

A first illustrative useful polynucleotide that is included in thepolynucleotide of sequence SEQ ID N^(O)1 is the polynucleotide ofsequence SEQ ID N^(o)2 that corresponds to the Sp6 end-sequence of SEQID N^(o)1.

A second illustrative useful polynucleotide that is included in thepolynucleotide of sequence SEQ ID N^(o)1 is the polynucleotide ofsequence SEQ ID N^(o)3 that corresponds to the T7 end-sequence of SEQ IDN^(o)1, located on the opposite strand.

The polynucleotide of sequence SEQ ID N^(o)1 contains 11 ORFs, therespective locations of which, taking into account the orientation ofeach ORF on the chromosome, on the sequence of the Mycobacteriumtunerculosis chromosome, is given hereafter:

The location of ORF1 is comprised between nucleotide at position nt1695944 and nucleotide at position nt 1696441.

The location of ORF2 is comprised between nucleotide at position nt1696728 and nucleotide at position nt1697420.

The location of ORF3 is comprised between nucleotide at position nt1698096 and nucleotide at position nt1699892. ORF3 probably encodes aprotein having the characteristics of a membrane protein.

The location of ORF4 is comprised between nucleotide at position nt1700210 and nucleotide at position nt1701088.

The location of ORF5 is comprised between nucleotide at position nt1701293 and nucleotide at position nt1702588. ORF5 encodes a proteinhaving the characteristics of a membrane protein.

The location of ORF6 is comprised between nucleotide at position nt1703072 and nucleotide at position nt1704091. ORF6 encodes a proteinhaving the characteristics of a GDP-D-mannose dehydratase.

The location of ORF7 is comprised between nucleotide at position nt1704091 and nucleotide at position nt1705056. ORF7 encodes a proteinhaving the characteristics of a nucleotide sugar epimerase involved incolanic acid biosynthesis.

The location of ORF8 is comprised between nucleotide at position nt1705056 and nucleotide at position nt1705784.

The location of ORF9 is comprised between nucleotide at position nt1705808 and nucleotide at position nt1706593. ORF9 encodes a proteinhaving the characteristics of colanic acid biosynthesis glycosyltransferase.

The location of ORF10 is comprised between nucleotide at position nt1706631 and nucleotide at position nt 1707524.

The location of ORF11 is comprised between nucleotide at position nt1707530 and nucleotide at position nt1708648. ORF11 encodes a proteinsimilar to a spore coat polysaccharide biosynthesis.

A polynucleotide of interest obtained by the above-disclosed methodaccording to the invention may also contain at least one ORF thatencodes all or part of acidic, glycine-rich proteins, belonging to thePE and PPE families, whose genes are often clustered and based onmultiple copies of the polymorphic repetitive sequences. The names PEand PPE derive from the fact that the motifs ProGlu (PE, positions 8, 9)and ProProGlu (PPE, positions 7 to 9) are found near the N-terminus inalmost all cases. The PE protein family all have a highly conservedN-terminal domain of ˜110 amino acid residues, that is predicted to havea globular structure, followed by a C-terminal segment which varies insize, sequence and repeat copy number. Phylogenetic analysis separatedthe PE family into several groups, the larger of which is the highlyrepetitive PGRS class containing 55 members whereas the other groupsshare very limited sequence similarity in their C-terminal domains. Thepredicted molecular weights of the PE proteins vary considerably as afew members only contain the ˜110 amino acid N-terminal domain while themajority have C-terminal extensions ranging in size from 100 up to >1400residues. A striking feature of the PGRS proteins is their exceptionalglycine content (up to 50%) due to the presence of multiple tandemrepetitions of GlyGlyAla or GlyGlyAsn motifs or variations thereof.

Like the PE family, the PPE protein family also has a conservedN-terminal domain that comprises ˜180 amino acid residues followed byC-terminal segments that vary considerably in sequence and length. Theseproteins fall into at least three groups, one of which constitutes theMPTR class characterised by the presence of multiple, tandem copies ofthe motif AsnXGlyXGlyAsnXGly (SEQ ID NO. 730). The second subgroupcontains a characteristic, well-conserved motif around position 350(GlyXXSerValProXXTrp)(SEQ ID NO. 731), whereas the other group containsproteins that are unrelated except for the presence of the common180-residue PPE domain. C-terminal extensions may range in size from 00up to 3500 residues.

One member of the PGRS sub-family, the WHO antigen 22T (Abou-Zeid etal., 1991), a 55 kD protein capable of binding fibronectin, is producedduring disease and elicits a variable antibody response suggestingeither that individuals mount different immune responses or that thisPGRS-protein may not be produced in this form by all strains of Mtuberculosis. In other words, at least some PE_PGRS coding sequencesencode for proteins that are involved in the recognition of Mtuberculosis by the immune system of the infected host. Therefore,differences in the PGRS sequences could represent the principal sourceof antigenic variation in the otherwise genetically and antigenicallyhomogeneous bacterium.

By performing the method of the invention using the M tuberculosis BACbased DNA library I-1945, the inventors have discovered the occurence ofsequence differences between a given PGRS encoding ORF (ORF reference onthe genomic sequence of M tuberculosis Rv0746) of M tuberculosis and itscounterpart sequence in the genome of M bovis BCG.

More precisely, the inventors have determined that one ORF contained inBAC vector N^(o) Rv418 of the M tuberculosis BCG I-1945 DNA librarycarries both base additions and base deletions when compared with thecorresponding ORF in the genome of M bovis BCG that is contained in theBAC vector N^(o) X0175 of the M bovis BCG I-2049 DNA libary. Thevariations observed in the base sequences correspond to variations inthe C-terminal part of the aminoacid sequence of the PGRS ORFtranslation product.

As shown in FIG. 6, an amino acid stretch of 9 residues in length ispresent in this M tuberculosis PGRS(ORf reference Rv0746) and is absentfrom the ORF counterpart of M bovis BCG, namely the following amino acidsequence:

NH₂-GGAGGAGGSSAGGGGAGGAGGAGGWLLGD-COOH (SEQ ID NO. 732).

Furthermore, FIG. 6 shows also that an amino acid stretch of 45 residuesin length is absent from this M tuberculosis PGRS and is present in theORF counterpart of M bovis BCG, namely following amino acid sequence:

NH₂-GAGGIGGIGGNANGGAGGNGGTGGQLWGSGGAGVEGGAAL SVGDT-COOH (SEQ ID NO.733).

Similar observations were made with PPE ORF Rv0442, which showed a 5codon deletion relative to a M bovis amino acid sequence.

Given that the polymorphism associated with the PE-PGRS or PEE ORFSresulted in extensive antigenic variability or reduced antigenpresentation, this would be of immense significance for vaccine design,for understanding protective immunity in tuberculosis and, possibly,explain the varied responses seen in different BCG vaccinationprogrammes.

There are several striking parallels between the PGRS proteins and theEpstein-Barr virus-encoded nuclear antigens (EBNA). Both polypeptidefamilies are glycine-rich, contain Gly-Ala repeats that represent morethan one third of the molecule, and display variation in the length ofthe repeat region between different isolates. The Gly-Ala repeat regionof EBNA1 has been shown to function as a cis-acting inhibitor of antigenprocessing and MHC class I-restricted antigen presentation. (Levitskayaet al., 1995). The fact that MHC class I knock-out mice are extremelysuscepible to M tuberculosis underlines the importance of MHC class Iantigen presentation in protection against tuberculosis. Therefore, itis possible that the PE/PPE protein family also play some role ininhibiting antigen presentation, allowing the bacillus to hide from thehost's immune system.

As such the novel and nonobvious PGRS polynucleotide from M bovis whichis homolog to the M tuberculosis ORF Rv0746, and which is contained inthe BAC clone N^(o) X0175 (See Table 4 for SP6 and T7 end-sequences ofclone n^(o) X0175) of the I-2049 M bovis BCG BAC DNA library is part ofthe present invention, as it represents a starting material in order todefine specific probes or primers useful for detection of antigenicvariability in mycobacterial strains, possible inhibition of antigenprocessing as well as to differentiate M tuberculosis from M bovis BCG.

Thus, a further object of the invention consists in a polynucleotidecomprising the sequence SEQ ID N^(o)4.

Polynucleotides of interest have been defined by the inventors as usefuldetection tools in order to differentiate M tuberculosis from M bovisBCG. Such polynucleotides are contained in the 45 aminoacid lengthcoding sequence that is present in M bovis BCG but absent from Mtuberculosis. This polynucleotide has a sequence beginning (5′ end) atthe nucleotide at position nt 729 of the sequence SEQ ID N^(o)4 andending (3′ end) at the nucleotide in position nt 863 of the sequence SEQID N^(o)4.

Thus, part of the present invention is also a polynucleotide which ischosen among the following group of polynucleotides:

a) a polynucleotide comprising at least 8 consecutive nucleotides of thenucleotide sequence SEQ ID N^(o)5;

b) a polynucleotide which sequence is fully complementary to thesequence of the polynucleotide defined in a);

c) a polynucleotide that hybridizes under stringent hybridizationconditions with the polynucleotide defined in a) or with thepolynucleotide defined in b).

The stringent hybridization conditions for the purpose of defining theabove disclosed polynucleotide are defined herein before in thespecification.

The invention also provides for a BAC-based Mycobacterium tuberculosisstrain H37Rv genomic DNA library that has been deposited in theCollection Nationale de Cultures de Microorganismes on Nov. 19, 1997under the accession number I-1945.

A further object of the invention consists in a recombinant BAC vectorwhich is chosen among the group consisting of the recombinant BACvectors belonging to the BAC-based DNA library I-1945.

Generally, a recombinant BAC vector of interest may be chosen among thefollowing set or group of BAC vectors contained in the BAC-based DNAlibrary I-1945

Rv101; Rv102; Rv103; Rv104; Rv105; Rv106; Rv107; Rv108; Rv109; Rv10;Rv110; Rv111; Rv112; Rv113; Rv114; Rv115; Rv116; Rv117; Rv118; Rv119;Rv11; Rv120; Rv121; Rv122; Rv123; Rv124; Rv126; Rv127; Rv128; Rv129;Rv130; Rv132; Rv134; Rv135; Rv136; Rv137; Rv138; Rv139; Rv13; Rv140;Rv141; Rv142; Rv143; Rv144; Rv145; Rv146; Rv147; Rv148; Rv149; Rv14;Rv150; Rv151; Rv152; Rv153; Rv154; Rv155; Rv156; Rv157; Rv159; Rv15;Rv160; Rv161; Rv162; Rv163; Rv164; Rv165; Rv166; Rv167; Rv169; Rv16;Rv170; Rv171; Rv172; Rv173; Rv174; Rv175; Rv176; Rv177; Rv178; Rv179;Rv17; Rv180; Rv181; Rv182; Rv183; Rv184; Rv185; Rv186; Rv187; Rv188;Rv18; Rv190; Rv191; Rv192; Rv193; Rv194; Rv195; Rv196; Rv19; Rv1; Rv201;Rv204; Rv205; Rv207; Rv209; Rv20; Rv214; Rv215; Rv217; Rv218; Rv219;Rv21; Rv220; Rv221; Rv222; Rv223; Rv224; Rv225; Rv226; Rv227; Rv228;Rv229; Rv22; Rv230; Rv231; Rv232; Rv233; Rv234; Rv235; Rv237; Rv240;Rv241; Rv243; Rv244; Rv245; Rv246; Rv247; Rv249; Rv24; Rv251; Rv252;Rv253; Rv254; Rv255; Rv257; Rv258; Rv259; Rv25; Rv260; Rv261; Rv262;Rv263; Rv264; Rv265; Rv266; Rv267; Rv268; Rv269; Rv26; Rv270; Rv271;Rv272; Rv273; Rv274; Rv275; Rv276; Rv277; Rv278; Rv279; Rv27; Rv280;Rv281; Rv282; Rv283; Rv284; Rv285; Rv286; Rv287; Rv288; Rv289; Rv28;Rv290; Rv291; Rv292; Rv293; Rv294; Rv295; Rv296; Rv29; Rv2; Rv301;Rv302; Rv303; Rv304; Rv306; Rv307; Rv308; Rv309; Rv30; Rv310; Rv31;Rv312; Rv313; Rv314; Rv315; Rv316; Rv317; Rv318; Rv319; Rv31; Rv32;Rv322; Rv327; Rv328; Rv329; Rv32; Rv330; Rv331; Rv333; Rv334; Rv335;Rv336; Rv337; Rv338; Rv339; Rv33; Rv340; Rv341; Rv343; Rv344; Rv346;Rv347; Rv348; Rv349; Rv34; Rv350; Rv351; Rv352; Rv353; Rv354; Rv355;Rv356; Rv357; Rv358; Rv359; Rv35; Rv360; Rv361; Rv363; Rv364; Rv365;Rv366; Rv367; Rv368; Rv369; Rv36; Rv370; Rv371; Rv373; Rv374; Rv375;Rv376; Rv377; Rv378; Rv379; Rv37; Rv381; Rv382; Rv383; Rv384; Rv385;Rv386; Rv387; Rv388; Rv389; Rv38; Rv390; Rv391; Rv392; Rv393; Rv396;Rv39; Rv3; Rv40; Rv412; Rv413; Rv414; Rv415; Rv416; Rv417; Rv418; Rv419;Rv41; Rv42; Rv43; Rv44; Rv45; Rv46; Rv47; Rv48; Rv49; Rv4; Rv50; Rv51;Rv52; Rv53; Rv54; Rv55; Rv56; Rv57; Rv58; Rv59; Rv5; Rv60; Rv61; Rv62;Rv63; Rv64; Rv65; Rv66; Rv67; Rv68; Rv69; Rv6; Rv70; Rv71; Rv72; Rv73;Rv74; Rv75; Rv76; Rv77; Rv78; Rv79; Rv7; Rv80; Rv81; Rv82; Rv83; Rv84;Rv85; Rv86; Rv87; Rv88; Rv89; Rv8; Rv90; Rv91; Rv92; Rv94; Rv95; Rv96;Rv9.

The end sequences of the polynucleotide inserts of each of the aboveclones corresponding respectively to the sequences adjacent to the T7promoter and to the Sp6 promoter on the BAC vector are shown in Table 3.

It has been shown by the inventors that the minimal overlapping set ofBAC vectors of the BAC-based DNA library I-1945 contains 68 unique BACclones and practically spans almost the whole H37Rv chromosome with theexception of a single gap of approximately 150 kb.

More specifically, a recombinant BAC vector of interest is choosen amongthe following set or group of BAC vectors from the BAC-based DNA libraryI-1945, the location of which vector DNA inserts on the chromosome of M.tuberculosis is shown in FIG. 3

Rv234; Rv351; Rv166; Rv35; Rv415; Rv404; Rv209; Rv272; Rv30; Rv228;Rv233; Rb38; Rv280; Rv177; Rv48; Rv374; Rv151; Rv238; Rv156; Rv92; Rv3;Rv403; Rv322; Rv243; Rv330; Rv285; Rv233; Rv219; Rv416; Rv67; Rv222;Rv149; Rv279; Rv87; Rv273; Rv266; Rv25; Rv136; Rv414; Rv13; Rv289; Rv60;Rv140; Rv5; Rv165; Rv215; Rv329; Rv240; Rv19; Rv74; Rv411; Rv167; Rv56;Rv80; Rv164; Rv59; Rv313; Rv265; Rv308; Rv220; Rv258; Rv339; Rv121;Rv419; Rv418; Rv45; Rv217; Rv134; Rv17; Rv103; Rv21; Rv22; Rv2; Rv270;Rv267; Rv174; Rv257; Rv44; Rv71; Rv7; Rv27; Rv191; Rv230; Rv128; Rv407;Rv106; Rv39; Rv255; Rv74; Rv355; Rv268; Rv58; Rv173; Rv264; Rv417;Rv401; Rv144; Rv302; Rv81; Rv163; Rv281; Rv221; Rv420; Rv175; Rv86;Rv412; Rv73; Rv269; Rv214; Rv287; Rv42; Rv143.

The polynucleotides disclosed in Table 3 may be used as probes in orderto select a given clone of the BAC DNA library I-1945 for further use.

The invention also provides for a BAC-based Mycobacterium bovis strainPasteur genomic DNA library that has been deposited in the CollectionNationale de Cultures de Microorganismes on Jun. 30, 1998 under theaccession number I-2049.

A further object of the invention consists in a recombinant BAC vectorwhich is chosen among the group consisting of the recombinant BACvectors belonging to the BAC-based DNA library I-2049. This DNA librarycontains approximately 1600 clones. The average insert size is estimatedto be ˜80 kb.

Generally, a recombinant BAC vector of interest may be chosen among thefollowing set or group of BAC vectors contained in the BAC-based DNAlibrary I-2049:

X0001; X0002; X0003; X0004; X0006; X0007; X0008; X0009; X000; X0012;X0013; X0014; X0015; X0016; X0017; X0018; X0019; X0020; X0021; X0175.

The end sequences of the polynucleotide inserts of each of the aboveclones corresponding respectively to the sequences adjacent to the T7promoter and to the Sp6 promoter on the BAC vector are shown in Table 4.

The polynucleotides disclosed in Table 4 may be used as probes in orderto select a given clone of the BAC DNA library I-2049 for further use.

Are also part of the invention the polynucleotide inserts that arecontained in the above described BAC vectors, that are useful as primersor probes.

These polynucleotides and nucleic acid fragments may be used as primersfor use in amplification reactions, or as nucleic probes.

PCR is described in the U.S. Pat. No. 4,683,202. The amplified fragmentsmay be identified by an agarose or a polyacrylamide gel electrophoresis,or by a capillary electrophoresis or alternatively by a chromatographytechnique (gel filtration, hydrophobic chromatography or ion exchangechromatography). The specificity of the amplification may be ensured bya molecular hybridization using, for example, one of the initial primersas nucleic probes.

Amplified nucleotide fragments are used as probes in hybridizationreactions in order to detect the presence of one polynucleotideaccording to the present invention or in order to detect mutations inthe genome of the given mycobacterium of interest, specifically amycobacterium belonging to the Mycobacterium tuberculosis complex andmore specifically Mycobacterium tuberculosis and Mycobacterium bovisBCG.

Are also part of the present invention the amplified nucleic fragments(<<amplicons>>) defined herein above.

These probes and amplicons may be radioactively or non-radioactivelylabeled, using for example enzymes or fluorescent compounds.

Other techniques related to nucleic acid amplification may also be usedand are generally preferred to the PCR technique.

The Strand Displacement Amplification (SDA) technique (Walker et al.,1992) is an isothermal amplification technique based on the ability of arestriction enzyme to cleave one of the strands at his recognition site(which is under a hemiphosphorothioate form) and on the property of aDNA polymerase to initiate the synthesis of a new strand from the 3′OHend generated by the restriction enzyme and on the property of this DNApolymerase to displace the previously synthesized strand being localizeddownstream. The SDA method comprises two main steps:

a) The synthesis, in the presence of dCTP-alpha-S, of DNA molecules thatare flanked by the restriction sites that may be cleaved by anappropriate enzyme.

b) The exponential amplification of these DNA molecules modified assuch, by ezyme cleavage, strand displacement and copying of thedisplaced strands. The steps of cleavage, strand displacement and copyare repeated a sufficient number of times in order to obtain an accuratesensitivity of the assay.

The SDA technique was initially realized using the restrictionendonuclease HincII but is now generally practised with an endonucleasefrom Bacillus stearothermophilus (BSOBI) and a fragment of a DNApolymerase which is devoid of any 5′→3′ exonuclease activity isolatedfrom Bacilllus cladotenax (exo-Bca) [=exo-minus-Bca]. Both enzymes areable to operate at 60° C. and the system is now optimized in order toallow the use of dUTP and the decontamination by UDG. When unsing thistechnique, as described by Spargo et al. in 1996, the doubling time ofthe target DNA is of 26 seconds and the amplification rate is of 10¹⁰after an incubation time of 15 min at 60° C.

The SDA amplification technique is more easy to perform than PCR (asingle thermostated waterbath device is necessary) and is faster thantthe other amplification methods.

Thus, another object of the present invention consists in using thenucleic acid fragments according to the invention (primers) in a methodof DNA or RNA amplification according to the SDA technique. Forperforming SDA, two pairs of primers are used: a pair of externalprimers (B1, B2) consisting of a sequence specific for the targetpolynucleotide of interest and a pair of internal primers (S1, S2)consisting of a fusion oligonucleotide carrying a site that isrecognized by a restriction endonuclease, for exemple the enzyme BSOBI.

The operating conditions to perform SDA with such primers are describedin Spargo et al, 1996.

The polynucleotides of the invention and their above describedfragments, especially the primers according to the invention, are usefulas technical means for performing different target nucleic acidamplification methods such as:

TAS (Transcription-based Amplification System), described by Kwoh et al.in 1989.

SR (Self-Sustained Sequence Replication), described by Guatelli et al.in 1990.

NASBA (Nucleic acid Sequence Based Amplification), described by Kievitiset al. in 1991.

TMA (Transcription Mediated Amplification).

The polynucleotides according to the invention are also useful astechnical means for performing methods for amplification or modificationof a nucleic acid used as a probe, such as:

LCR (Ligase Chain Reaction), described by Landegren et al. in 1988 andimproved by Barany et al. in 1991 who employ a thermostable ligase.

RCR (Repair Chain Reaction) described by Segev et al. in 1992.

CPR (Cycling Probe Reaction), described by Duck et al. in 1990.

Q-beta replicase reaction, described by Miele et al. in 1983 andimproved by Chu et al. in 1986, Lizardi et al. in 1988 and by Burg etal. and Stone et al. in 1996.

When the target polynucleotide to be detected is a RNA, for example amRNA, a reverse transcriptase enzyme will be used before theamplification reaction in order to obtain a cDNA from the RNA containedin the biological sample. The generated cDNA is subsequently used as thenucleic acid target for the primers or the probes used in anamplification process or a detection process according to the presentinvention.

The non-labeled polynucleotides or oligonucleotides of the invention maybe directly used as probes. Nevertheless, the polynucleotides oroligonucleotides are generally labeled with a radioactive element ³²P,³⁵S, ³H, ¹²⁵I) or by a nonisotopic molecule (for example, biotin,acetylaminofluorene, digoxigenin, 5bromodesoxyuridin, fluorescein) inorder to generate probes that are useful for numerous applications.

Examples of non-radioactive labeling, of nucleic acid-fragments aredescribed in the french patent N^(o) FR-7810975 or by Urdea et al. orSanchez-Pescador et al., 1988.

In the latter case, other labeling techniques may be also used such asthose described in the french patents FR-2 422 956 and 2 518 755. Thehybridization step may be performed in different ways (Matthews et al.,1988). The more general method consists of immobilizing the nucleic acidthat has been extracted from the biological sample onto a substrate(nitrocellulose, nylon, polystyrene) and then to incubate, in definedconditions, the target nucleic acid with the probe. Subsequently to thehybridization step, the excess amount of the specific probe is discardedand the hybrid molecules formed are detected by an appropriate method(radioactivity, fluorescence or enzyme activity measurement).

Advantageously, the probes according to the present invention may havestructural characteristics such that they allow the signalamplification, such structural characteristics being, for example,branched DNA probes as those described by Urdea et al. in 1991 or in theEuropean patent N^(o) EP-0 225 807 (Chiron).

In another advantageous embodiment of the probes according to thepresent invention, the latters may be used as <<capture probes >>, andare for this purpose immobilized on a substrate in order to capture thetarget nucleic acid contained in a biological sample. The capturedtarget nucleic acid is subsequently detected with a second probe whichrecognizes a sequence of the target nucleic acid which is different fromthe sequence recognized by the capture probe.

The oligonucleotide probes according to the present invention may alsobe used in a detection device comprising a matrix library of probesimmobilized on a substrate, the sequence of each probe of a given lengthbeing localized in a shift of one or several bases, one from the other,each probe of the matrix library thus being complementary to a distinctsequence of the target nucleic acid. Optionally, the substrate of thematrix may be a material able to act as an electron donor, the detectionof the matrix poisitons in which an hybridization has occurred beingsubsequently determined by an electronic device. Such matrix librariesof probes and methods of specific detection of a targer nucleic acid isdescribed in the European patent application N^(o) EP-0 713 016 (Affymaxtechnologies) and also in the US patent N^(o) U.S. Pat. No. 5,202,231(Drmanac).

Since almost the whole length of a mycobacterial chromososme is coveredby a BAC-based genomic DNA libraries according to the present invention(i.e. 97% of the M. tuberculosis chromosome is covered by the BAClibrary I-1945), these DNA libraries will play an important role in aplurality of post-genomic applications, such as in mycobacterial geneexpression studies where the canonical set of BACs could be used as amatrix for hybridization studies. Probing such matrices with cDNA probesprepared from total mRNA will uncover genetic loci induced or repressedunder different physiological conditions (Chuang et al., 1993;Trieselmann et al., 1992). As such, the H37Rv BAC library represents afundamental resource for present and future genomics investigations.

The BAC vectors or the polynucleotide inserts contained therein may bedirectly used as probes, for example when immobilized on a substratesuch as described herein before.

The BAC vectors or their polynucleotide inserts may be directly asdorbedon a nitrocellulose membrane, at predetermined locations on which one orseveral polynucleotides to be tested are then put to hybridizetherewith.

Preferably, a collection of BAC vectors that spans the whole genome ofthe mycobacterium under testing will be immobilized, such as, forexample, the set of 68 BAC vectors of the I-1945 DNA library that isdescribed elsewhere in the specification and shown in FIG. 3.

The immobilization and hybridization steps may be performed as describedin the present Materials and Methods Section.

As another illustrative embodiment of the use of the BAC vectors of theinvention as polynucleotide probes, these vectors may be useful toperform a transcriptional activity analysis of mycobacteria growing indifferent environmental conditions, for example under conditions inwhich a stress response is expected, as it is the case at an elevatedtemperature, for example 40° C.

In this specific embodiment of the invention, Genescreen membranes maybe used to immobilize the restriction endonuclease digests (HindIIIdigests for the BAC DNA library I-1945) of the BAC vectors by tranferfrom a gel (Trieselmann et al., 1992).

Alternatively, the BAC vectors may be immobilized for dot blotexperiments as follows. First, the DNA concentration of each BAC cloneis determined by hybridization of blots of clone DNAs and of a BACvector concentration standard with a BAC vector specific DNA probe.Hybridization is quantified by the Betascope 603 blot analyzer (BetagenCorp.), which colects beta particles directly from the blot with highefficiency. Then, 0.5 μg of each clone DNA is incubated in 0.25 M NaOHand 10 mM EDTA at 65° C. for 60 min to denature the DNA and degraderesidual RNA contaminants. By using a manifold filtration system (21 by21 wells), each clone DNA is blotted onto a GeneScreen Plus nylonmembrane in the alkaline solution. After neutralization, the blots arebaked at 85° C. for 2 h under vacuum. Positive and negative controls areadded when necessary. In order to perform this procedure, it may berefer-red to the article of Chuang et al. (1993).

For RNA extractions, cells grown in a suitable volume of culture mediummay, for example, be immediately mixed with an equal volume of crushedice at −70° C. and spun at 4° C. in a 50 ml centrifugation tube. Thecell pellet is then suspended in 0.6 ml of ice-cold buffer (10 mM KCl, 5mM MgCl, 10 mM Tris; pH 7.4) and then immediately added to 0.6 ml of hotlysis buffer (0.4 M NaCl, 40 mM EDTA, 1% beta-mercaptoethanol, 1% SDS,20 mM Tris; pH 7.4) containing 100 μl of water saturated phenol. Thismixture is incubated in a boiling water bath for 40 s. The debris areremoved by centrifugation. The supernatant is extracted withphenol-chloroform five times, ethanol precipitated, and dried. The driedRNA pellet is dissolved in water before use.

Then labeled total cDNA may be prepared by the following method. Thereaction mixture contains 15 μg of the previously prepared total RNA, 5μg of pd(N₆) (random hexamers from Pharmacia Inc.), 0.5 mM dATP, 0.5 mMdGTP and 0.5 mM DTTP, 5 μM dCTP, 100 μCi of [α-³²P]dCTP (3,000 Ci/mmol),50 mM Tris-HCl (pH 8.3), 6 mM MgCl₂, 40 mM Kcl, 0.5 U of avianmyeloblastosis virus reverse transcriptase (Life Science Inc.) in atotal volume of 50 μl. The reaction is allowed to continue overnight atroom temperature. EDTA and NaOH are then added to final concentrationsof 50 mM and 0.25 M, respectively, and the mixture is incubated at 65°C. for 30 min to degrade the RNA templates. The cDNA is then ready touse after neutralization by adding Hcl and Tris buffer.

The hybridization step may be performed as described by Chuang et al.(1993) and briefly disclosed hereinafter. The DNA dot blot is hybridizedto ³²P-labeled total cDNA in a solution containing 0.1%polyvinylpyrrolidone, 0.1% Ficoll 0.1% sodium Ppi, 0.1% bovine serumalbumin, 0.5% SDS, 100 mM NaCl, and 0.1 mM sodium citrate, pH 7.2, at65° C. for 2 days and then washed with a solution containing 0.1% SDS,100 mM NaCl, and 10 mM Na-citrate, pH 7.2. The same dot blot is used forhybridization with both control and experimental cDNAs, with an alkalineprobe stripping procedure (soaked twice in 0.25M NaOH-0.75 M NaCl atroom temperature, 30 min each, neutralized, and completely dried at 65°C. for at least 30 min) between the two hybridizations. Quantificationmay be done with the Betascope 603 blot analyzer (Betagen Corp.).

As it flows from the above technical teachings, another object of theinvention consists in a method for detecting the presence of mycobateriain a biological sample comprising the steps of:

a) bringing into contact the recombinant BAC vector or a purifiedpolynucleotide according to the invention with a biological sample;

b) detecting the hybrid nucleic acid molecule formed between saidpurified polynucleotide and the nucleic acid molecules contained withinthe biological sample.

The invention further deals with a method for detecting the presence ofmycobacteria in a biological sample comprising the steps of:

a) bringing into contact the recombinant BAC vector or a purifiedpolynucleotide according to the invention that has been immobilized ontoa substrate with a biological sample;

b) bringing into contact the hybrid nucleic acid molecule formed betweensaid purified polynucleotide and the nucleic acid contained in thebiological sample with a labeled recombinant BAC vector or apolynucleotide according to the invention, provided that saidpolynucleotide and polynucleotide of step a) have non-overlappingsequences.

Another object of the invention consists in a method for detecting thepresence of mycobacteria in a biological sample comprising the steps of:

a) bringing into contact the nucleic acid molecules contained in thebiological sample with a pair of primers according to the invention;

b) amplifying said nucleic acid molecules;

c) detecting the nucleic acid fragments that have been amplified, forexample by gel electrophoresis or with a labeled polynucleotideaccording to the invention.

In one specific embodiment of the above detection and/or amplificationmethods, said methods comprise an additional step wherein before stepa), the nucleic acid molecules of the biological sample have been madeavailable to a hybridization reaction.

In another specific embodiment of the above detection methods, saidmethods comprise an additional step, wherein, before the detection step,the nucleic acid molecules that are not hybridized with the immobilizedpurified polynucleotide are removed.

Also part of the invention is a kit for detecting mycobacteria in abiological sample comprising:

a) a recombinant BAC vector or a purified polynucleotide according tothe invention;

b) reagents necessary to perform a nucleic acid hybridization reaction.

The invention also pertains to a kit for detecting a mycobacteria in abiological sample comprising:

a) a recombinant BAC vector or a purified polynucleotide according tothe invention that is immobilized onto a substrate;

b) reagents necessary to perform a nucleic acid hybridization reaction;

c) a purified polynucleotide according to the invention which isradioactively or non-radioactively labeled, provided that saidpolynucleotide and the polynucleotide of step a) have non-overlappingsequences.

Moreover, the invention provides for a kit for detecting mycobacteria ina biological sample comprising:

a) a pair of purified primers according to the invention;

b) reagents necessary to perform a nucleic acid amplification reaction;

c) optionally, a purified polynucleotide according to the inventionuseful as a probe.

The invention embraces also a method for detecting the presence of agenomic DNA, a cDNA or a mRNA of mycobacteria in a biological sample,comprising the steps of:

a) bringing into contact the biological sample with a plurality of BACvectors according to the invention or purified polynucleotides accordingto the invention, that are immobilized on a substrate;

b) detecting the hybrid complexes formed.

The invention also provides a kit for detecting the presence of genomicDNA, cDNA or mRNA of a mycobacterium in a biological sample, comprising:

a) a substrate on which a plurality of BAC vectors according to theinvention or purified polynucleotides according to the invention havebeen immobilized;

b) optionally, the reagents necessary to perform the hybridizationreaction.

Additionally, the recombinant BAC vectors according to the invention andthe polynucleotide inserts contained therein may be used for performingdetection methods based on <<molecular combing >>. Said methods consistin methods for aligning macromolecules, especially DNA and are appliedto processes for detecting, for measuring intramolecular distance, forseparating and/or for assaying a macromolecule, especially DNA in asample.

These <<molecular combing >> methods are simple methods, where thetriple line S/A/B (meniscus) resulting form the contact between asolvent A and the surface S and a medium B is caused to move on the saidsurface S, the said macromolecules (i.e. DNA) having a part, especiallyan end, anchored on the surface S, the other part, especially the otherend, being in solution in the solvent A. These methods are particularlyfully described in the PCT Application n^(o) PCT/FR 95/00165 files onFeb. 11, 1994 (Bensimon et al.).

When performing the <<molecular combing >> method with the recombinantBAC vectors according to the inventions or their polynucleotide inserts,the latters may be immobilized (<<anchored>>) on a suitable substrateand aligned as described in the PCT Application n^(o) PCT/FR 95/00165,the whole teachings of this PCT Application being herien incorporated byreference. Then, polynucleotides to be tested, preferably under the formof radioactively or non radioactively labeled polynucleotides, that mayconsist of fragments of genomic DNA, cDNA etc. are brought into contactwith the previously aligned polynucleotides according to the presentinvention and then their hybridization position on the aligned DNAmolecules is determined using any suitable means including a microscopeor a suitable camera device.

Thus, the present invention is also directed to a method for thedetection of the presence of a polynucleotide of mycobacterial origin ina biological sample and/or for physical mapping of a polynucleotide on agenomic DNA, said method comprising:

a) aligning at least one polynucleotide contained in a recombinant BACvector according to the invention on the surface of a substrate;

b) bringing into contact at least one polynucleotide to be tested withthe substrate on which the at least one polynucleotide of step a) hasbeen aligned;

c) detecting the presence and/or the location of the testedpolynucleotide on the at least one aligned polynucleotide of step a).

The invention finally provides for a kit for performing the abovemethod, comprising:

a) a substrate whose surface has at least one polynucleotide containedin a recombinant BAC vector according to the invention;

b) optionally, reagents necessary for labeling DNA;

c) optionally, reagents necessary for performing a hybridizationreaction.

In conclusion, it may be underlined that the alliance of such BAC-basedapproaches such as described in the present specification to theadvances in comparative genomics by the availability of an increasednumber of complete genomes, and the rapid increase of well-characterizedgene products in the public databases, will allow the one skilled in theart an exhaustive analysis of the mycobacterial genome.

Materials and Methods

1. DNA-preparation. Preparation of M tuberculosis H37Rv DNA in agaroseplugs was conducted as previously described (Canard et al., 1989;Philipp et al., 1996b). Plugs were stored in 0.2 M EDTA at 4° C. andwashed 3 times in 0.1% Triton X-100 buffer prior to use.

2. BAC vector preparation. pBeloBAC11 was kindly provided by Dr.Shizuya, Department of Biology, California Institute of Technology(Pasadena, Calif.). The preparation followed the description of Woo etal., 1994 (Woo et al., 1994).

3. Partial digestion with HindIII. Partial digestion was carried out onplugs, each containing approximately 10 μg of high molecular weight DNA,after three one hour equilibration steps in 50 ml of HindIII 1×digestion buffer (Boehringer Mannheim, Mannheim, Germany) plus 0.1%Triton X-100. The buffer was then removed and replaced by 1 ml/plug ofice-cold HindIII enzyme buffer containing 20 units of HindIII(Boehringer). After two hours incubation on ice, the plugs weretransferred to a 37° C. water bath for 30 minutes. Digestions werestopped by adding 500 μl of 50 mM EDTA (pH 8.0).

4. Size selection. The partially digested DNA was subjected tocontour-clamped homogenous electric field (CHEF) electrophoresis on a 1%agarose gel using a BioRad DR III apparatus (BioRad, Hercules, Calif.)in 1×TAE buffer at 13° C., with a ramp from 3 to 15 seconds at 6 V/cmfor 16 hours. Agarose slices from 25 to 75 kb, 75 to 120 kb and 120 to180 kb were excised from the gel and stored in TE at 4° C.

5. Ligation and transformation. Agarose-slices containing fractions from25 to 75 kb, 75 to 120 kb and 120 to 180 kb were melted at 65° C. for 10minutes and digested with Gelase (Epicentre Technologies, Madison,Wis.), using 1 unit per 100 μl gel-slice. 25–100 ng of the size-selectedDNA was then ligated to 10 ng of HindIII digested, dephosphorylatedpBeloBAC11 in a 1:10 molar ratio using 10 units of T4 DNA ligase (NewEngland Biolabs, Beverly, Mass.) at 16° C. for 20 hours. Ligationmixtures were heated at 65° C. for 15 minutes, then drop-dialysedagainst TE using Millipore VS 0.025 mM membranes (Millipore, Bedford,Mass.). Fresh electrocompetent E. coli DH10B cells (Sheng et al., 1995)were harvested from 200 ml of a mid-log (OD₅₅₀=0.5) culture grown in SOBmedium. Cells were washed three times in ice-cold water, and finallyresuspended in ice-cold water to a cell density of 10¹¹ cells/ml(OD₅₅₀=150). 1 μl of the ligation-mix was used for electroporation of 30μl of electrocompetent DH10B E. coli using a Eurogentec Easyject Pluselectroporator (Eurogentec, Seraing, Belgium), with settings of 2.5 kV,25 μF, and 99 Ω, in 2 mm wide electroporation cuvettes. Afterelectroporation, cells were resuspended in 600 μl of SOC medium, allowedto recover for 45 minutes at 37° C. with gentle shaking, and then platedon LB agar containing 12.5 μg/ml chloramphenicol (CM), 50 μg/ml-X-gal,and 25 μg/ml IPTG. The plates were incubated overnight and recombinants(white colonies) were picked manually to 96 well plates. Each clone wasinoculated 3 times (2×200 μl and 1×100 μl of 2YT/12.5 μg/ml CM perclone) and incubated overnight. One of the microtiter plates, containing100 μl culture per well, was maintained as a master plate at −80° C.after 100 ml of 80% glycerol were added to each well, while minipreps(Sambrook et al., 1989) were prepared from the remaining two plates tocheck for the presence of inserts. Clones containing inserts were thendesignated “Rv” clones, repicked from the master plate to a second setof plates for storage of the library at −80° C.

6. Preparation of DNA for sizing, direct sequencing and comparativegenomics. A modified Bimboim and Doly protocol (Bimboim et al., 1979)was used for extraction of plasmid DNA for sequencing purposes. Each Rvclone was inoculated into a 50 ml Falcon polypropylene tube containing40 ml of 2YT medium with 12.5 μg/ml of CM and grown overnight at 37° C.with shaking. Cells were harvested by centrifugation and stored at −20°C. The frozen pellet was resuspended in 4 ml of Solution A (50 mMglucose, 10 mM EDTA, 25 mM Tris, pH 8.0) and 4 ml of freshly preparedsolution B (0.2 M NaOH 0.2% SDS) was then added. The solution was gentlymixed and kept at room temperature for 5 minutes before adding 4 ml ofice-cold solution C (3M Sodium Acetate, pH 4.7). Tubes were kept on icefor 15 min, and centrifuged at 10,000 rpm for 15 min. After isopropanolprecipitation, the DNA pellet was dissolved in 600 μl RNase solution (15mM Tris HCl pH 8.0, 10 μg/ml RNase A). After 30 minutes at 37° C. theDNA solution was extracted with chloroform:isoamylalcohol (24:1) andprecipitated from the aqueous phase using isopropanol. The DNA pelletwas then rinsed with 70% ethanol, air-dried and dissolved in 30 μldistilled water. In general, DNA prepared by this method was clean andconcentrated enough to give good quality results by automatic sequencing(at least 300 bp of sequence). For a few DNA preparations, an additionalpolyethylene glycol (PEG) precipitation step was necessary, which wasperformed as follows. The 30 μl of DNA solution were diluted to 64 μl,mixed gently and precipitated using 16 μl 4M NaCl and 80 μl of 13% PEG8000. After 30 min on ice the tubes were centrifuged at 4° C., thepellet carefully rinsed with 70% ethanol, air-dried and diluted in 20 μlof distilled water.

7. Sizing of inserts. Insert sizes were determined by pulsed-field gelelectrophoresis (PFGE) after cleavage with DraI (Promega). 100–200 ng ofDNA was DraI-cleaved in 20 μl total reaction volume, following themanufacturer's recommendations, then loaded onto a 1% agarose gel andmigrated using a pulse of 4 s for 15 h at 6.25 V/cm at 110° C. on anLKB-Pharmacia CHEF apparatus. Mid-range and low-range PFGE markers (NewEngland Biolabs) were used as size standards. Insert sizes wereestimated after ethidium bromide staining of gels.

8. Direct sequencing. For each sequencing reaction 7 μl BAC DNA(300–500ng), 2 μl primer (2 μM), 8 μl reaction mix of the Taq DyeDeoxyTerminator cycle sequencing kit (Applied Biosystems) and 3 μl distilledwater were used.

After 26 cycles (96° C. for 30 sec; 56° C. for 15 sec; 60° C. for 4 min)in a thermocycler (MJ-research Inc., Watertown, Mass.) DNA wasprecipitated using 70 μl of 70% ethanol/0.5 mM MgCl₂, centrifuged,rinsed with 70% ethanol, dried and dissolved in 2 μl of formamide/EDTAbuffer. SP6 and T7 samples of 32 BAC clones were loaded onto 64 lane, 6%polyacrylamide gels and electrophoresis was performed on a Model 373Aautomatic DNA sequencer (Applied Biosystems) for 12 to 16 hours. Thesequences of oligonucleotides used as primers are shown in Table 1.

9. DOP-PCR. As an alternate procedure we used partially degenerateoligonucleotides in combination with vector-specific (SP6 or T7) primersto amplify insert ends of BAC clones, following a previously publishedprotocol for PI clones (Liu et al., 1995). The degenerate primers Deg2,Deg3, Deg4, Deg6 (Table 1) gave the best results for selectedamplification of insert termini.

Table 1: Primers used for PCRs and sequencing

Vector specific Primers for DOP PCR— first amplification step:

SP6-BAC1: AGT TAG CTC ACT CAT TAG GCA (SEQ ID NO. 734)

T7-BAC1: GGA TGT GCT GCA AGG CGA TTA (SEQ ID NO. 735)

Vector specific Primers (direct sequencing, nested primer for second PCRstep)

SP6 Mid: AAA CAG CTA TGA CCA TGA TTA CGC CAA (SEQ ID NO. 736)

T7-Belo2: TCC TCT AGA GTC GAC CTG CAG GCA (SEQ ID NO.

737)

Degenerate Primers:

Deg2: TCT AGA NNN NNN TCC GGC (SEQ ID NO. 738)

Deg3: TCT AGA NNN NNN GGG CCC (SEQ ID NO. 739)

Deg4: CGT TTA AAN NNN NWA GGC CG (SEQ ID NO. 740)

Deg6: GGT ACT AGT NNN NNW TCC GGC (SEQ ID NO. 741)

Primers used for the amplification of M bovis DNA in polymorphicchromosomal region of Rv58:

Primer 1: ACG ACC TCA TAT TCC GAA TCC C (SEQ ID NO. 742)

Primer 2: GCA TCT GTT GAG TAC GCA CTT CC (SEQ ID NO. 743)

10. Screening by pooled PCR. To identify particular clones in thelibrary which could not be detected by random end-sequencing of the 400BAC clones, PCR-screening of DNA pools was performed. Primers weredesigned for regions of the chromosome where no BAC coverage wasapparent using cosmid—or H37Rv whole genome shotgun sequences. Primerswere designed to amplify approximately 400–500 bp. Ninety-six-wellplates containing 200 μl 2YT/12.5 μg/ml CM per well were inoculated with5 μl of −80° C. glycerol stock cultures each from the master plates andincubated overnight. The 96 clones of each plate were pooled by taking20 μl of culture from each well and this procedure was repeated for 31plates. Pooled cultures were centrifuged, the pellets were resuspendedin sterile water, boiled for 5 minutes, centrifuged and the supernatantskept for PCRs. As an initial screening step, the 31 pools of a total of2976 BACs, representing about two thirds of the library were tested forthe presence of a specific clone using appropriate PCR primers. PCR wasperformed using 10 μl of supernatant, 5 μl of assay buffer (100 mMb-mercaptoethanol, 600 mM Tris HC 1 (pH 8.8), 20 mM MgCl₂, 170 mM(NH₄)₂SO4), 5 μl of Dimethylsulfoxide (DMSO), 5 μl of dNTPs (20 mM), 5μl of water, 10 μl primer (2 μM), 10 μl inverse primer (2 μM) and 0.2units of Taq DNA polymerase (Boehringer). 32 cycles of PCR (95° C. for30 s, 55° C. for 1 min 30 s, 72° C. for 2 min) were performed after aninitial denaturation at 95° C. for 1 min. An extension step at 72° C.for 5 min finished the PCR. If a pool of 96 clones yielded anappropriate PCR product (FIG. 1A), subpools were made to identify thespecific clone. Subpools representative for lane A of a 96 well platewere made by pooling clones 1 to 12 from lane A into a separate tube.Subpools for lanes B to H were made in the same way. In addition,subpools of each of the 12 rows (containing 8 clones each) were made, sothat for one 96 well plate, 20 subpools were obtained. PCR with these 20subpools identified the specific clone (FIG. 1B, lower gel portion). Ifmore than one specific clone was present among the 96 clones of oneplate (FIG. 1B, upper gel portion), additional PCR reactions had to beperformed with the possible candidates (data not shown).

11. Genomic comparisons. DNA from the BAC clone Rv58 was digested withthe restriction endonucleases EcoR1 and PvuII, and resolved by agarosegel electrophoresis at low voltage overnight (1.5 V/cm). DNA wastransferred via the method of Southern to nitrocellulose membranes(Hybond C extra, Amersham) following standard protocols (Sambrook etal., 1989), then fixed to the membranes at 80° C. for 2 hours. The blotwas hybridized with ³²P labelled total genomic DNA from M tuberculosisH37Rv, M bovis type strain (ATCC 19210) or M. bovis BCG Pasteur.Hybridization was performed at 37° C. overnight in 50% formamidehybridization buffer as previously described (Philipp et al., 1996b).Results were interpreted from the autoradiograms.

12. Computer analysis. Sequence data from the automated sequencerABI373A were transferred as binary data to a Digital Alpha 200 stationor Sun SparcII station and analysed using TED, a sequence analysisprogram from the Staden software package (Dear et al., 1991). Proof-readsequences were compared using the BLAST programs (Altschul et al., 1990)to the M tuberculosis H37Rv sequence databases of the Sanger Centre,containing the collected cosmid sequences (TB.dbs) and whole-genomeshotgun reads (TB_shotgun_all.dbs) (http://www.sanger.ac.uk/). Inaddition, local databases containing 1520 cosmid end-sequences and theaccumulating BAC end-sequences were used to determine the exact locationof end-sequenced BACs on the physical and genetic map. MycDB (Bergh etal., 1994) and public databases (EMBL, Genbank) were also used tocompare new sequences, but to a lesser extent. The organization of theopen reading frames (ORFs) in the polymorpbic region of clone Rv58 wasdetermined using the DIANA software established at the Sanger Centre.

EXAMPLES Example 1 Construction of a pBeloBAC11 library of Mtuberculosis H37Rv

Partial HindIII fragments of H37Rv DNA in the size range of 25 to 180 kbwere ligated into pBeloBAC11 and electroporated into strain E. coliDH10B. While cloning of fractions I (25 to 75 kb) and II (75 to 120 kb)gave approximately 4×10⁴ transformants (white colonies), cloning offraction III (120 to 180 kb) repeatedly resulted in empty clones.Parallel cloning experiments using partial HindIII digests of human DNAresulted in stable inserts for all three fractions (data not shown),suggesting that the maximum size of large inserts in BAC clones isstrongly dependent on the source of the DNA. Analysis of the clones forthe presence of inserts revealed that 70% of the clones had an insert ofthe appropriate size while the remaining 30% of white coloniesrepresented empty or lacZ′-mutated clones. Size determination ofrandomly selected, DraIcleaved BACs via PFGE showed that the insertsizes ranged for the majority of the clones between 40 kb and 100 kbwith an average size of 70 kb. Clones with inserts of appropriate sizewere designated with “Rv” numbers, recultured and stored at −80° C. forfurther use.

Example 2 Direct DNA Sequence Analysis of BACs

To characterize the BAC clones, they were systematically subjected toinsert termini sequencing. Two approaches, direct sequencing of BAC DNAand PCR with degenerate oligonucleotide primers (DOP), adapted to thehigh G+C content of mycobacterial DNA, were used. In a first screeningphase, 50 BAC clones designated Rv1 to Rv50 were analysed using bothmethods in parallel. Except for two clones, where the sequences divergedsignificantly, the sequences obtained by the two methods only differedin length. Sequences obtained directly were on average about 350 bp longand for 95% of the clones both the SP6 and T7 end-sequences wereobtained at the first attempt. Sequences obtained by DOP-PCR were mostlyshorter than 300 bp. For 40% of the BACs we obtained only very shortamplicons of 50 to 100 base pairs from one end. In two cases thesequence obtained with the DOP-PCR differed from the sequences obtainedby direct sequencing, and in these cases E. coli or vector sequenceswere amplified (data not shown). Taking the advantages and disadvantagesof both methods into account, we decided to use direct terminisequencing for the systematic determination of the SP6 and T7end-sequences.

Example 3 Representativity of the Library

After having determined the end-sequences of 400 BACs a certainredundancy was seen. The majority of clones were represented at least 3to 4 times. Maximum redundancy was seen in the vicinity of the uniquerrn operon, as 2.5% of the clones carried identical fragments thatbridge the cosmids Y50 and Y130 (FIG. 3, approximate position at 1440kb). The majority of clones with identical inserts appeared as twovariants, corresponding to both possible orientations of the HindIIIfragment in pBeloBACII. This suggests that the redundancy was not theresult of amplification during library construction, but due to thelimited number of possible combinations of partial HindIII fragments inthe given size-range of 25 to 120 kb. To detect rare BAC clones, apooled PCR protocol was used. Primers were designed on the basis of theexisting cosmid sequences and used to screen 31 pools of 96 BAC clones.When positive PCR products of the correct size were obtained, smallersubpools (of 8 or 12 clones each) of the corresponding pool weresubsequently used to identify the corresponding clone (FIGS. 1A and 1B).With this approach 20 additional BACs (Rv401–Rv420) were found for theregions where no BACs were found with the initial systematic sequencingapproach. The end-sequences of these BACs (Rv401–420) were determined bydirect sequencing, which confirmed the predicted location of the cloneson the chromosome. A 97% coverage of the genome of H37Rv with BAC cloneswas obtained. Only one region of ˜150 kb was apparently not representedin the BAC library as screening of all pools with several sets ofspecific primers did not reveal the corresponding clone. This wasprobably due to the fact that HindIII fragments of mycobactenial DNAlarger than 110 kb are very difficult to establish in E. coli and that aHindIII fragment of ˜120 kb is present in this region of the chromosome(data not shown).

Example 4 Establishing a BAC Map

Using all end-sequence and shotgun-sequence data from the H37Rv genomesequencing project, most of the BAC clones could then be localized bysequence comparison on the integrated map of the chromosome of Mtuberculosis strain H37Rv (Philipp et al., 1996b) and an orderedphysical map of the BAC-clones was established. PCR with primers fromthe termini sequences of selected BACs were used for chromosomal walkingand confirmation of overlapping BACs (data not shown). The correct orderof BACs on the map was also confirmed more recently, using 40,000 wholegenome shotgun reads established at the Sanger Centre. In addition,pulsed-field gel electrophoresis of DraI digests of selected BACs wasperformed (FIG. 2) in order to see if the approximate fragment size andthe presence or absence of DraI cleavage sites in the insert wereconsistent with the location of the BACs on the physical map (FIG. 3).Comparison of the sequence-based BAC-map with the physical and geneticmap, established by PFGE and hybridization experiments (Philipp et al.,1996b), showed that the two maps were in good agreement. The positionsof 8 genetic markers previously shown on the physical and genetic mapwere directly confirmed by BAC-end-sequence data (Table 2, FIG. 3). Theposition of 43 from 47 Y-clones (91%) shown on the physical and geneticmap, which were later shotgun sequenced, was confirmed by the BACend-sequences and shotgun sequence data. Four clones (Y63, Y180, Y251,and Y253) were located to different positions than previously thoughtand this was found to be due to book keeping errors or to chimericinserts. Their present approximate location relative to the oriC isshown in FIG. 3: Y63 at 380 kb, Y63A at 2300 kb, Y180 at 2160 kb, Y251,at 100 kb, and Y253 at 2700 kb. A total of 48 BACs, covering regions ofthe chromosome, not represented by cosmids were then shotgun sequenced(Cole et al., 1997), and these are squared in FIG. 3. No chimeric BACswere found, which is consistent with the observations of other researchgroups for other BAC libraries (Cai et al., 1995; Zimmer et al., 1997).The absence of chimenic BACs was of particular importance for thecorrect assembly of the M. tuberculosis H37Rv sequence. The exactposition of the BAC termini sequences on the chromosome will beavailable via the world wide web (http://www.pasteur.fr/MycDB).

Table 2: Identities of genetic markers previously shown on theintegrated and genetic map of H37Rv. (Phlipp et al., 1996b) which showedperfect sequence homology with BAC end sequences.

GenBank BAC end Description of Accession Locus sequence genetic markerOrganism n^(o) apa Rv163SP6 Secreted M. tuberculosis X80268alanine-proline- rich dnaJ, dnaK Rv164T7 antigen M. leprae M95576 fop-ARv136T7 DnaJ hsp M. tuberculosis M27016 polA Rv401T7 Fibronectin M.tuberculosis L11920 binding ponA Rv273T7 protein M. leprae S82044 pstCRv103T7 DNA polymerase M. tuberculosis Z48057 I recA Rv415SP6 Penicillinbinding M. tuberculosis X58485 protein wag9 Rv35SP6 Putative M.tuberculosis M69187 phosphate transport receptor Homologousrecombination 35-kDa antigen

Example 5 Repetitive End-Sequences

Repetitive sequences can seriously confound mapping and sequenceassembly. In the case of the BAC end-sequences, no particular problemswith repetitive sequences were observed. Although nine clones with oneend in an IS1081 (Collins et al., 1991) sequence were identified, it waspossible to correctly locate their position on the map using thesequence of the second terminus. Moreover, these BACs were used todetermine the exact locations of IS1081 sequences on the map. Fivecopies of this insertion sequence, which harbors a HindIII cleavagesite, were mapped on the previous physical and genetic map. In contrast,BAC end-sequence data revealed an additional copy of IS1081 on the M.tuberculosis H37Rv chromosome. The additional copy was identified by sixclones (Rv27, Rv 118, Rv142, Rv160, Rv190, Rv371) which harbored anidentical fragment linking Y50 to 1364 (FIG. 3, at ˜1380 kb). This copyof IS1081 was not found by previous hybridization experiments probablybecause it is located near another copy of IS1081, localized on the sameDraI fragment Z7 and AsnI fragment U (FIG. 3, at ˜1140 kb). Furthermore,the position of a copy of IS1081 previously shown in DraI fragment Y1(FIG. 3, at 1840 kb) had to be changed to the region of Y349 (FIG. 3, at˜3340 kb) according to the end-sequences of BAC Rv223. The positions ofthe four other IS1081 copies were confirmed by the sequence data andtherefore remained unchanged. In total 6 copies of IS1081 wereidentified in the H37Rv genome in agreement with the findings of others(Collins et al., 1991).

In addition, a sequence of 1165 bp in length containing a HindIII sitewas found in two copies in the genome of H37Rv in different regions. Theend-sequences of BAC clones Rv48 and Rv374, covering cosmid Y164, aswell as Rv419 and Rv45, that cover cosmid Y92, had perfect identity withthe corresponding parts of this 1165 bp sequence (FIG. 3, at ˜3480 kband ˜900 kb). Analysis of the sequence did not reveal any homology withinsertion sequences or other repetitive elements. However, as each ofthe two locations showed appropriate BAC coverage, chimerism of thesequenced cosmids Y164 and Y92 can be ruled out as the probable cause.

Example 6 Using BAC Clones in Comparative Genomics

The minimal overlapping set of BAC clones represents a powerful tool forcomparative genomics. For example, with each BAC clone containing onaverage an insert of 70 kb, it should be possible to cover a 1 Mbsection of the chromosome with 15 BAC clones. Restriction digests ofoverlapping clones can then be blotted to membranes, and probed withradiolabelled total genomic DNA from, for example, M. bovis BCG Pasteur.Restriction fragments that fail to hybridize with the M bovis BCGPasteur DNA must be absent from its genome, hence identifyingpolymorphic regions between M. bovis BCG Pasteur and M. tuberculosisH37Rv. The results of such an analysis with clone Rv58 (FIG. 3, at ˜1680kb) are shown here. This clone covers a previously described polymorphicgenomic region between M tuberculosis and M. bovis BCG strains (Philippet al., 1996a). EcoRl and PvuII digests from clone Rv58, fixed onnitrocellulose membranes, were hybridized with ³²P-labelled totalgenomic DNA from M. tuberculosis H37Rv, M. bovis (ATCC 19120), and Mbovis BCG Pasteur. FIGS. 4A and 4B present the results of this analysis,where it is clear that several restriction fragments from clone Rv58failed to hybridize with genomic DNA from either M. bovis or M. bovisBCG Pasteur. On the basis of the various missing restriction fragments,a restriction map of the polymorphic region was established and comparedto the H37Rv sequence data. The localization of the polymorphism couldtherefore be estimated, and appropriate oligonucleotide primers(Table 1) were selected for the amplification and sequencing of thecorresponding region in M bovis. The alignment of M bovis and M.tuberculosis H37Rv sequences showed that 12,732 bp were absent from thechromosomal region of the M. bovis type strain and M. bovis BCG Pasteurstrain. The G+C content of the polymorphic region is 62.3 mol %, whichis the same as the average genome G+C content of the M. tuberculosisgenome, hence indicating that this region is not a prophage or othersuch insertion. Subsequent PCR studies revealed that this segment wasalso absent from the Danish, Russian, and Glaxo substrain's of M. bovisBCG, suggesting that this polymorphism can be used to distinguish M.bovis from M. tuberculosis. Analysis of this sequence showed that 11putative open reading frames (ORFs) are present in M tuberculosis,corresponding to ORFs MTCY277.28 to MTCY277.38/accession numberZ79701-EMBL Nucleotide Sequence Data Library (FIG. 5). FASTA searchesagainst the protein and nucleic acid databases revealed that the genesof this region may be involved in polysaccharide biosynthesis. Amongthese putative genes, the highest score was seen with ORF 6(MTCY277.33), whose putative product shows a 51.9% identity withGDP-D-Mannose dehydratase from Pseudomonas aeruginoso (accession numberU18320—EMBL Nucleotide Sequence Data Library) in a 320 amino acidoverlap. The novel M bovis sequence of the polymorphic region wasdeposited under accession number AJ003103 in the EMBL NucleotideSequence Data Library.

As it appears from the teachings of the specification, the invention isnot limited in scope to one or several of the above detailedembodiments; the present invention also embraces all the alternativesthat can be performed by one skilled in the same technical field,without deviating from the subject or from the scope of the instantinvention.

1. A purified polynucleotide, comprising an Open Reading Frame containedwithin SEQ ID NO:1, wherein SEQ ID NO: 1 consists of nucleotide1,696,015 through nucleotide 1,708,746 of the Mycobacterium tuberculosischromosome, and wherein the polynucleotide is selected from: (a)nucleotide 1,696,015 through nucleotide 1,697,420 of the Mycobacteriumtuberculosis chromosome; (b) nucleotide 1,696,015 through nucleotide1,699,892 of the Mycobacterium tuberculosis chromosome; (c) nucleotide1,696,015 through nucleotide 1,701,088 of the Mycobacterium tuberculosischromosome; (d) nucleotide 1,696,015 through nucleotide 1,702,588 of theMycobacterium tuberculosis chromosome; (e) nucleotide 1,696,015 throughnucleotide 1,704,091 of the Mycobacterium tuberculosis chromosome; (f)nucleotide 1,696,015 through nucleotide 1,705,056 of the Mycobacteriumtuberculosis chromosome; (g) nucleotide 1,696,015 through nucleotide1,705,784 of the Mycobacterium tuberculosis chromosome; (h) nucleotide1,696,015 through nucleotide 1,706,593 of the Mycobacterium tuberculosischromosome; (i) nucleotide 1,696,015 through nucleotide 1,707,524 of theMycobacterium tuberculosis chromosome; or (j) nucleotide 1,696,015through nucleotide 1,708,648 of the Mycobacterium tuberculosischromosome.
 2. A purified polynucleotide, comprising an Open ReadingFrame contained within SEQ ID NO:1, wherein SEQ ID NO: 1 consists ofnucleotide 1,696,015 through nucleotide 1,708,746 of the Mycobacteriumtuberculosis chromosome, and wherein the polynucleotide is selectedfrom: (a) nucleotide 1,696,728 through nucleotide 1,708,746 of theMycobacterium tuberculosis chromosome; (b) nucleotide 1,698,096 throughnucleotide 1,708,746 of the Mycobacterium tuberculosis chromosome; (c)nucleotide 1,700,210 through nucleotide 1,708,746 of the Mycobacteriumtuberculosis chromosome; (d) nucleotide 1,701,293 through nucleotide1,708,746 of the Mycobacterium tuberculosis chromosome; (e) nucleotide1,703,072 through nucleotide 1,708,746 of the Mycobacterium tuberculosischromosome; (f) nucleotide 1,704,091 through nucleotide 1,708,746 of theMycobacterium tuberculosis chromosome; (g) nucleotide 1,705,056 throughnucleotide 1,708,746 of the Mycobacterium tuberculosis chromosome; (h)nucleotide 1,705,808 through nucleotide 1,708,746 of the Mycobacteriumtuberculosis chromosome (i) nucleotide 1,706,631 through nucleotide1,708,746 of the Mycobacterium tuberculosis chromosome; or (j)nucleotide 1,707,530 through nucleotide 1,708,746 of the Mycobacteriumtuberculosis chromosome.
 3. A purified polynucleotide, comprising anOpen Reading Frame contained within SEQ ID NO:1, wherein SEQ ID NO: 1consists of nucleotide 1,696,015 through nucleotide 1,708,746 of theMycobacterium tuberculosis chromosome, and wherein the polynucleotide isselected from: (a) nucleotide 1,696,015 through nucleotide 1,698,096 ofthe Mycobacterium tuberculosis chromosome; (b) nucleotide 1,696,015through nucleotide 1,700,210 of the Mycobacterium tuberculosischromosome; (c) nucleotide 1,696,015 through nucleotide 1,701,293 of theMycobacterium tuberculosis chromosome; (d) nucleotide 1,696,015 throughnucleotide 1,703,072 of the Mycobacterium tuberculosis chromosome; (e)nucleotide 1,696,015 through nucleotide 1,704,091 of the Mycobacteriumtuberculosis chromosome; (f) nucleotide 1,696,015 through nucleotide1,705,056 of the Mycobacterium tuberculosis chromosome; (g) nucleotide1,696,015 through nucleotide 1,705,808 of the Mycobacterium tuberculosischromosome (h) nucleotide 1,696,015 through nucleotide 1,706,631 of theMycobacterium tuberculosis chromosome; or (i) nucleotide 1,696,015through nucleotide 1,707,530 of the Mycobacterium tuberculosischromosome.
 4. A purified polynucleotide, comprising an Open ReadingFrame contained within SEQ ID NO:1, wherein SEQ ID NO: 1 consists ofnucleotide 1,696,015 through nucleotide 1,708,746 of the Mycobacteriumtuberculosis chromosome, and wherein the polynucleotide is selectedfrom: (a) nucleotide 1,696,441 through nucleotide 1,708,746 of theMycobacterium tuberculosis chromosome; (b) nucleotide 1,697,420 throughnucleotide 1,708,746 of the Mycobacterium tuberculosis chromosome; (c)nucleotide 1,699,892 through nucleotide 1,708,746 of the Mycobacteriumtuberculosis chromosome; (d) nucleotide 1,701,088 through nucleotide1,708,746 of the Mycobacterium tuberculosis chromosome; (e) nucleotide1,702,588 through nucleotide 1,708,746 of the Mycobacterium tuberculosischromosome; (f) nucleotide 1,704,091 through nucleotide 1,708,746 of theMycobacterium tuberculosis chromosome; (g) nucleotide 1,705,056 throughnucleotide 1,708,746 of the Mycobacterium tuberculosis chromosome; (h)nucleotide 1,705,784 through nucleotide 1,708,746 of the Mycobacteriumtuberculosis chromosome; (i) nucleotide 1,707,524 through nucleotide1,708,746 of the Mycobacterium tuberculosis chromosome; or (j)nucleotide 1,706,593 through nucleotide 1,708,746 of the Mycobacteriumtuberculosis chromosome.